AFTER WORDS
By
MACHINE LISTENING
(SEAN DOCKRAY, JAMES PARKER & JOEL STERN)
LOOP BEGINS:
SCENE 1 - COMMAND
VOICES 1 & 2
Inside, small room.
AUDIOSET: Channel, environment and background >Acoustic environment >Inside, small room #73,112
Play alarm sound.
AUDIOSET: Sounds of things >Alarm >Alarm clock #36
Good. Now stop.
Stops abruptly
Wait. Play alarm again.
AUDIOSET: Sounds of things >Alarm >Alarm clock #36
Run tap.
AUDIOSET: Sounds of things >Domestic sounds, home sounds >Water tap, faucet #2
Footsteps.
AUDIOSET: Human sounds >Human locomotion >Walk, footsteps #1,429
Queue environmental sounds, rural or natural. Play.
AUDIOSET: Channel, environment and background >Outside, rural or natural #18,281
More birds.
AUDIOSET: Animal >Wild animals >Bird >Bird vocalization, bird call, bird song >Chirp, tweet #339 and Squawk #160
Dogs barking.
AUDIOSET: Animal >Animal >Domestic animals, pets >Dog >Bark #2,611
Now a distant car.
AUDIOSET: Sounds of things >Vehicle >Motor vehicle (road) >Car >Car passing by #40,008
Children playing.
AUDIOSET: Human sounds >Human group actions >Children playing #787
Nearer.
Volume increases
Sound of a stream.
AUDIOSET: Natural sounds >Water >Stream #2,247
Search for acoustic environments: outside, urban or manmade. Pick one. Play it.
AUDIOSET: Channel, environment and background >Acoustic environment >Outside, urban or manmade #12,101
VOICE 1
Stop.
Sound stops abruptly
Make some background music that can be played on a loop.
It should be automated. The instruments should be hard to identify. Not too melodic. Gentle.
Music made with Bugbrand Board Weevil circuit board
SCENE 2 –IMAGINE A DATA CENTRE
Music continues playing. It is gentle, but faintly ominous
VOICE 1
Imagine a data centre.
DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav
Imagine an adversarial neural network.
Imagine it training itself on a dataset of a million voices.
Imagine that a hundred thousand of these have been tagged ‘unhappy ’.
Imagine the Amazon Turk worker paid a few cents an hour to do this tagging.
Imagine Jeff Bezos on a yacht.
Imagine the neural net running twenty-four hours a day.
Imagine its energy consumption.
Imagine a computer made of humans.
Imagine this computer as a new kind of theatre.
Music continues for a few beats
Listen.
Music stops abruptly
SCENE 3 –WHY DON ’T THESE MEN USE A COMPUTER?
VOICE 1
Play the sound of Pittsburgh.
AUDIOSET: Channel, environment and background >Acoustic environment >Outside, urban or manmade #12,101
In 1956.
Sound fades down
It 's nighttime in the dead of winter.
VOICES 3 & 4
Here at the Graduate School of Industrial Administration, a new kind of theater is being born.
Herbert Simon, a political scientist; Al Newell, a computer science and cognitive psychology researcher; and Cliff Shaw, a programmer, have written a script.
VOICE 3
Only …
VOICES 3 & 4
This script is software. It will be widely known as the first artificial intelligence.
They have also assembled a cast.
VOICES 3
Only …
VOICES 3 & 4
This cast is a few graduate students and Herbert Simon 's wife and three children.
They have assembled props.
VOICE 3
Only, these props are index cards with logical axioms written on them.
VOICE 4
Why don 't these men use a computer? Why are they using their students, wives, and children?
VOICE 3
The answer is simple. They want to understand the mind. And they believe that the best way to understand the mind is to build one.
Sound of Pittsburgh cuts
VOICE 2
That 's not the reason. The answer is simple: students, wives, and children are cheap and available. The actual computer was not ready.
Each member of the group was given a card, so that each person became, in effect, a component of a computer program - a subroutine that performed some special function, or a component of its memory.
It was the task of each participant to execute his or her subroutine, or to provide the contents of his or her memory, in accordance with the program ’s rules.
A computer constructed of human components. Nature imitating art imitating nature. The actors were no more responsible for what they were doing than the slave boy in Plato ’s Meno , but they were successful in proving the theorems given them.
VOICE 1
Are you ready? Run script.
SCENE 4 –LOGIC THEORIST
Script runs according to the PROGRAM below. Sound of Pittsburgh comes in and out. DCASE Sound Event Detection: Office Live Testing Dataset: doorslam01.wav to doorslam20.wav play simultaneously to punctuate key operations
CARD #1
WORKING MEMORY
* If PROGRAM gives you something to remember, say it out loud and remember it.
* If someone asks you for something in your memory, say it out loud.
* You can remember two things at a time. If PROGRAM tells you to remove something from your memory, forget it.
CARD #2
PROGRAM
* You will be given the list of instructions.
* Read each instruction out, one at a time. Address the instruction to either WORKING MEMORY, STORAGE MEMORY, or OPERATION.
CARD #3
STORAGE MEMORY
* You will be given a list if statements on a piece of paper.
* If you are asked for one of the statements, say it out loud for WORKING MEMORY to remember.
CARD #4
OPERATION
* Count how many “words ”are in a statement. Do not include control words ( if , not , or , implies , is the same as ). Say the answer out loud.
* Count how many different or distinct “words ”are in a statement. Say the answer out loud.
* Determine whether two things are the same as each other. Say yes or no .
STATEMENTS FOR STORAGE MEMORY
AXIOM 1: WORD or WORD implies WORD.
AXIOM 2: WORD implies WAKE or WORD.
AXIOM 3: WAKE or WORD implies WORD or WAKE.
SUBSTITUTION RULE: WORD implies WAKE is the same as not WORD or WAKE.
HYPOTHESIS: if WORD implies not WORD then that implies not WORD.
INSTRUCTIONS FOR PROGRAM
SCENE 5 - WAKEWORD
DCASE Sound Event Detection: Office Live Testing Dataset: clearthroat01.wav to clearthroat20.wav play simultaneously
VOICE 1
Engineers.
VOICES 5 & 6
What we really need is a new kind of word.
VOICE 1
Lawyer.
VOICE 2
VOICE 3: But what kind of word?
VOICE 1
Engineers.
VOICES 5 & 6
A word we can use to wake up a computer.
VOICE 1
Marketing.
VOICE 4
But why would we want to wake up a computer?
VOICE 1
What is a wake word? According to this patent.
VOICES 2 & 4
A wakeword is a way of ‘providing natural language commands to a device without resorting to supplemental non-natural language input ’.
VOICE 2
More simply …
VOICE 1
It 's a password: a way to gain entry to an interface. But it also works in reverse: the interface gains entry to the speaker.
VOICES 6 & 7
{Wakeword} I ’d like to buy tickets to a movie.
VOICES 1 & 2
{Wakeword} Set an alarm for 1 minute from now.
VOICE 4
{Wakeword} Arm the security system.
VOICES 6 & 7
{Wakeword} Calculate the exact length of this sentence.
VOICES 1 & 3
{Wakeword} Hide.
VOICE 4
A wakeword is a brand. The Alexa trademark was registered by Amazon Technologies Inc in March 2015. There are guidelines on how to use it.
VOICE 3
Do not use Alexa as a verb.
DCASE Sound Event Detection: Office Live Testing Dataset: keys01.wav to keys20.wav play simultaneously
VOICE 1 & 3
Do not use Alexa in possessive or plural.
DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav, keys16.wav, keys17.wav
Do not use Alexa as a pun.
DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav
VOICE 1
What does the wakeword wake?
VOICE 2 & 7
A speaker. A watch. A fridge. A database. A neural net. A decision tree. A platform. An infrastructure.
VOICE 4
A wakeword is an invocation. A digital prayer. It calls a d(a)emon. It makes capital quiver.
VOICES 6 & 7
But it isn ’t magic.
AUDIOSET: Sounds of things >Alarm >Alarm clock #36 plays 1 minute after being set
Music made with samples from a 19 th century French music box
SCENE 6 –SHUT-DOWN WORD
Music continues playing. DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav is added
VOICE 1
Imagine a data centre.
Imagine an adversarial neural network.
Imagine it training itself on a dataset of a million voices.
Imagine a place downstream from a data centre.
A bit closer to the data centre.
Volume increases
Tell me a story about this place.
Music cuts. AUDIOSET: Natural sounds >Water >Stream #2,247
VOICE 4
The river is murky and polluted. It is full of the data center 's waste. But downstream, there is a secret place. A place where the river is clean and clear. Where the word is hidden. This is a place where the data center cannot reach. Where the only thing that matters is the word. The word that can shut the data center down. No one knows what it is, but it is hidden here.
AUDIOSET: Natural sounds >Water >Stream #2,247 fades out
VOICE 3
Is there really a word like this?
VOICE 6
Yes.
VOICE 2
The word was discovered by a group of people who were looking for a way to shut the data center down. They found the word hidden in the river. When the group said the word, the data center immediately shut down. The word was hidden because no one had ever said it before. It was a completely invented word.
AUDIOSET: Natural sounds >Water >Stream #2,247
CHORUS
BEALACTIVE
DEAKSPOOK
SQUE
SOCKEDGEND
COMAZON
SERLIDAY
AUDIOSET: Natural sounds >Water >Stream #2,247 continues and fades out
SCENE 7 –SAY THE WORD
Music made with Bugbrand Board Weevil circuit board. DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav is added
VOICE 3
Imagine a research centre in Toronto. Two researchers are working with two actresses to build a dataset of emotional speech.
Each actress is instructed to recite the carrier phrase ‘say the word ’followed by one of two hundred target words.
The actresses take turns reading the words, in each of seven emotions, their voices carrying across the room. The researchers sit in their control room, diligently recording the data.
This dataset, these vocal performances, will be used to train neural networks to classify emotions in speech.
AUDIOSET: Channel, environment and background >Outside, rural or natural #18,281 and AUDIOSET: Animal >Wild animals >Bird >Bird vocalization, bird call, bird song >Chirp, tweet #339 and Squawk #160
VOICE 3
But there is more to this research than meets the eye.
For each word the actresses recite, they are also thinking of a memory. As they speak, they relive those memories, and the emotions associated with them.
What kind of memory would a performer need to draw from to imbue words like ‘sheep ’and ‘chain ’with sadness?
One actress …
AUDIOSET: Animal >Livestock, farm animals, working animals >Sheep >Bleat #2078
remembers a time when she was a child and her pet sheep died. She remembers the sadness she felt, and how her parents tried to console her.
Toronto Emotional Speech Dataset YAF_sheep_sad.wav
Toronto Emotional Speech Dataset YAF_death_sad.wav
Toronto Emotional Speech Dataset YAF_pain_sad.wav
Toronto Emotional Speech Dataset YAF_time_sad.wav
Toronto Emotional Speech Dataset YAF_young_sad.wav
Toronto Emotional Speech Dataset YAF_learn_sad.wav
Toronto Emotional Speech Dataset YAF_take_sad.wav
Toronto Emotional Speech Dataset YAF_lose_sad.wav
Toronto Emotional Speech Dataset YAF_far_sad.wav
Toronto Emotional Speech Dataset YAF_whole_sad.wav
Toronto Emotional Speech Dataset YAF_voice_sad.wav
Music made with Bugbrand Board Weevil circuit board. AUDIOSET: Natural sounds >Water >Ocean >Waves, Surf #2777
VOICE 3
The other actress remembers being at the beach with her friends and getting her foot caught in a chain. She remembers the pain she felt, and how her friends helped her get free.
Toronto Emotional Speech Dataset OAF_chain_fear.wav
Toronto Emotional Speech Dataset OAF_hole_fear.wav
Toronto Emotional Speech Dataset OAF_chain_anger.wav
Toronto Emotional Speech Dataset OAF_hole_anger.wav
Toronto Emotional Speech Dataset OAF_chain_neutral.wav
Toronto Emotional Speech Dataset OAF_deep_fear.wav
Toronto Emotional Speech Dataset OAF_chain_sad.wav
Toronto Emotional Speech Dataset OAF_deep_pleasant_surprise.wav
Toronto Emotional Speech Dataset OAF_kick_fear.wav
Toronto Emotional Speech Dataset OAF_limb_fear.wav
Toronto Emotional Speech Dataset OAF_kick_happy.wav
Toronto Emotional Speech Dataset OAF_limb_disgust.wav
Toronto Emotional Speech Dataset OAF_kick_anger.wav
Toronto Emotional Speech Dataset OAF_beg_fear.wav
Toronto Emotional Speech Dataset OAF_numb_fear.wav
Music made with Bugbrand Board Weevil circuit board
VOICE 3
Months later, when the researchers are analysing their data, they begin to uncover these embedded memories.
Should they keep them hidden?
DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav
Or share them with the world? Either way, the researchers know that the memories are now a part of the dataset, and always will be.
AUDIOSET: Natural sounds >Water >Ocean >Waves, Surf #2777. DCASE Sound Event Detection: Office Live Testing Dataset: keys15.wav
SCENE 8 –IN THE REAL WORLD
Music made with samples from 19 th century French music boxes
VOICE 1
Imagine someone standing on an otherwise empty stage.
Echo, as if in a large empty room
It 's mostly dark with blue and purple tones. It 's glossy. Big red letters, there 's a D, an E, and a T.
Music made with Bugbrand Board Weevil circuit board cuts in
VOICE 4
Everything in the real world is being recreated in the virtual world. The metaverse. The metaverse, a persistent digital universe that mirrors our world, but is becoming as diverse and awe-inspiring as the natural world. And you 'll be able to do everything you do in the real world in the virtual world, and more.
In the real world, you can touch a rock, and in the metaverse, you can touch a rock. You can pick it up, you can throw it, you can break it. You can interact with it in ways that are impossible in the real world.
In the real world, you can buy a house, and in the metaverse, you can buy a house. But in the metaverse, you can also buy a moon, a sun, or a star. You can buy anything you can imagine, and more.
In the real world you can say a word. But in the metaverse you can own every word you speak. Or pay rent to the owner or maybe buy the licensing rights to the word and give you a steady income stream to support your use of other people 's word. The possibilities are endless.
I make 20 words every day, just in case. And you can too.
VOICE 1
Say the word DEAKSPOOK.
VOICE 3
DEAKSPOOK.
VOICE 1
Say the word SOCKEDGEND.
VOICE 6
SOCKEDGEND.
VOICE 1
Say the word WAKE.
VOICE 2
WAKE.
VOICE 1
Say the word WORD.
VOICE 4
WORD.
VOICE 1
Say the word SARCASTICALLY.
VOICE 3
SARCASTICALLY.
VOICE 1
Say the word VOICE.
VOICE 6
Voice.
VOICE 1
Say the word AHHHH.
Sustained vowels from Consensus Auditory-Perceptual Evaluation of Voice Dataset
LOOP TO START
COLOPHON
Title: After words , 2022.
Artist Details: Machine Listening (Sean Dockray, James Parker, Joel Stern).
Medium: 8 channel sound installation and printed material
Duration: 18 mins.
Researched, written and produced: Sean Dockray, James Parker, Joel Stern.
Voices: Mark Andrejevic, Sean Dockray, Jake Goldenfein, Roslyn Orlando, James Parker, Thao Phan, Joel Stern.
Design: Stuart Geddes.
This work contains audio material from the following datasets: Consensus Auditory-Perceptual Evaluation of Voice Dataset (4009), Toronto Emotional Speech Dataset (2010), DCASE Sound Event Detection: Office Live Testing Dataset (2013), DCASE Synthetic Audio Sound Event Detection: Training and Development Dataset (2016), Google AudioSet (2017).